PARALLEL LOOP TRANSFORMATION METHODS FOR RACE DETECTION 
DURING AN EXECUTION OF PARALLEL PROGRAMS 



BACKGROUND OF THE INVENTION 

The present invention relates to parallel loop transformation methods for race 
detection during an execution of parallel programs. More particularly, the invention 
relates to a race detection method which is one of the debugging methods for parallel 
loop programs. The development of parallel programs for shared memory multi- 
processors has a number of difficult problems in comparison to the development of 
serial programs. The main difficulty stems from the complexity of program construction 
and the conspicuous absence of debugging methods for the parallel errors inherent in a 
parallel program. The race detection according to the present invention is a debugging 
method for these races among the parallel errors. Since the races result in an unintended 
non-deterministic execution of the programs in which the repeatability of an execution 
is not guaranteed, they are regarded as one of the most difficult parallel errors that 
prevents even the application of a cyclic debugging method based on breakpoint. A 
number of methods have been developed for detecting the races for the purpose of 
debugging. Among these, the race detection method during an execution of parallel 
programs is capable of detecting as well as reporting the occurrence of the races during 
an execution of a program which is subjected to a debugging process. In view of the 
delicate characteristics of the races at a point of execution, this race detection method is 
valued as the most effective. The present methods available for detecting races during 
the execution of a parallel program simultaneously perform the processing and 
monitoring functions necessary for detecting races through a software monitoring 
device which monitors the whole execution of the program. However, due to the 
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characteristics of a parallel program which require a long running time as a result of 
massive parallelism and the dramatic inflation of execution time induced by software 
monitoring, a practical implementation of the present race detection methods during an 
execution is not simple. Especially, the most commonly used parallel construction in 
5 shared memory parallel programs is a parallel loop and it consumes the largest portion 
of the total program execution time. This is the area in which most of the research 
efforts for improving the efficiency of the race detection methods during an execution 
are being concentrated. 

Hereinafter, the race detection methods for parallel loops according to prior art will 
10 be described in detail with reference to FIG. 1 . 

FIG. 1 is a work flow chart of the race detection methods for parallel loops 
according to prior art. 

First of all, prior to an execution of a parallel program, the race detection function is 
instrumented (SI 01) in order for all iterations corresponding to each parallel loop to 
15 perform inspection and monitoring processes for the race detection at a point of 
execution. 

Afterwards, the program is executed for the race detection (SI 02). In general, the 
race detection method according to prior art for a standard parallel loop where several 
thousands or significantly more iterations are performed, the performance deterioration 
20 due to a long running time is unavoidable. 

SUMMARY OF THE INVENTION 

The object of the present invention is to provide a parallel loop transformation 
method for race detection during an execution of parallel programs which can minimize 
25 the number of subjects to be monitored during the execution of parallel programs for an 
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effective race detection. 



BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a work flow chart of the race detection methods for parallel loops 
according to prior art; 

FIG. 2 is a work flow chart of the race detection methods for parallel loops 
according to the present invention; 

FIG. 3 is a detailed work flow diagram of the S210 step (static analysis step) as 
described in FIG. 2; 

FIG. 4 is a detailed work flow diagram of the S220 step (loop creation step) as 
described in FIG. 2; 

FIG. 5 is a diagram which shows examples of a configuration diagram of the data 
structure of a condition statement branch determinant string and execution path control 
statement used for the work flow of FIG. 2; and 

FIG. 6 is a detailed work flow diagram of the S230 step race detection function 
instrumentation step as described in FIG. 2. 

DETAILED DESCRIPTION OF THE EMBODIMENTS 

The parallel loop transformation method for race detection during an execution of 
parallel programs according to the present invention requires only two monitoring 
operations on the identical execution path for race detection irrespective of the 
maximum parallelism while the conventional methods perform a lot of duplicate and 
redundant monitoring operations since the entire loop is monitored on each loop even 
though the body of the each loop includes only a few execution paths. The objective 
of the present invention is achieved by minimizing the unnecessary monitoring time 



required for such a duplicate monitoring. 

Hence, the present invention transforms the original parallel loop into a parallel loop 
which should be monitored for detecting race and which can be dynamically recognized. 

In order to obtain the necessary information for the above transformation, a static 
analysis method for parallel loops as well as the method for actively utilizing race 
detection devices for monitoring only the iterations which are subjected to monitoring 
on parallel loops should be used in a row. 

The parallel loop transformation method for race detection during an execution of 
parallel programs according to the present invention, in which the original parallel loop 
is transformed into a full race covering loop for the race detection during the program 
execution subjected to parallel loop programs, comprises; a static analysis step of 
generating the data structure of a condition statement branch determinant string Cstr 
required for loop transformation taking the parallel loop as an input and extracting the 
execution path information; a parallel loop transformation step of transforming the 
parallel loop into a full race covering loop using said data structure of a condition 
statement branch determinant string Cstr required for the loop transformation and said 
execution paths information; a race detection function instrumentation step of 
instrumenting the race detection function in order to activate race detection function for 
the transformed parallel loop which are generated at said parallel loop transformation 
step; and a race detection execution step of executing race detection while running the 
parallel program according to instrumented detection functions which are determined at 
said race detection function instrumentation step. 

The static analysis step further comprises; an input step of sequentially receiving 
each statement of each parallel loop body in order to generate a single Cstr data 
structure for each single parallel loop; an assignment step of assigning a bit variable 



4 



which can store a true or false value to corresponding if-statement if said input 
statement is an if-statement; and an extraction step of extracting the Cstr data structure 
and the number of execution path for each parallel loop through an arbitrary path 
analyzer after assigning said bit variable. 

Also, the parallel loop transformation step further comprises; a determination step of 
determining whether the input statement is the first statement or not after a new 
statement is inputted to the loop body; an insertion step of inserting an execution path 
control statement, prior to the input statement if the inputted statement is determined to 
be the first statement, which dynamically assigns an appropriate value for Cstr in order 
to allow each iteration to have an intended execution path so as to minimize the 
duplicate monitoring for race detection against the parallel loop; a substitution step of 
substituting the conditional equation CI by a conditional statement for 
(Cstr[c_con_bit].eq.l) A ((-iCl) V (CI)) if the present statement is determined to be a 
conditional statement after the execution path control statement is inserted; and a 
repeating step of repeating the above actions until the inputted statement is determined 
to be the last statement. Here, if the present statement is not an if-statement, the input 
statement is maintained as it is and the above processes are repeated until a parallel 
loops is transformed into full race covering loop. 

The above execution path control statement determines the value of Cstr which is to 
be used for determining the execution path of the loop body from the present iteration 
using the value of the present loop control variable of each iteration. 

The above substituted conditional statement determines the branching of the present 
conditional statement using the Cstr value corresponding to the present conditional 
equation while maintaining the semantic of the orginal conditional equation. 

The race detection function instrumentation step further comprises; a determination 



step of determining whether the statement inputted to embed an appropriate race 
detection function into the transformed parallel loop is the beginning and ending 
statements of the parallel loop; an insertion step of a label creation statement and end 
statement which function on the iteration less than two times of the front and end 
5 execution paths until the last statement is inputted if the inputted statement is 
determined to be either the beginning and ending statements of the parallel loop; an 
inspection step of inspecting whether the present statement includes an accessing 
incident of the shared variables or not, if the inputted statement is not the beginning and 
ending statements of the parallel loop; and a instrumentation step of instrumenting the 
10 inspection statement, which inspects whether or not the accessing incident participates 
in the race, functions on the iteration less than two times of the execution paths until the 
last statement if the present statement includes an accessing incident of the shared 
variables. 

The parallel loop transformation method for race detection during an execution of 
15 parallel programs according to the present invention implements all the executable 
instructions of a digital processing apparatus according to their types. 

The read/write process of the digital processing apparatus comprises a static analysis 
step of generating the data structure of a condition statement branch determinant string 
Cstr required for loop transformation taking the parallel loop as an input and extracting 
20 the execution path information; a parallel loop transformation step of transforming the 
parallel loop into a full race covering loop using said data structure of a condition 
statement branch determinant string Cstr required for loop transformation and said 
execution paths information; a race detection function instrumentation step of 
instrumenting the race detection function in order to activate race detection within the 
25 iteration instances necessary for the transformed parallel loop which are generated at 
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said parallel loop transformation step; and a race detection execution step of executing 
race detection while running the parallel program according to instrumented detection 
functions which are determined at said race detection function instrumentation step. 

Hereinafter, preferred embodiments of the present invention will be described in 
5 detail with reference to the accompanying drawings. 

FIG. 2 is a work flow chart of the race detection methods for parallel loops 
according to the present invention. 

First of all, the necessary information for a parallel loop transformation is extracted 
from a static analysis of the loop body due to the input from each parallel loop (S210). 
10 After the necessary information for a parallel loop transformation is extracted, 
transformed parallel loops are created by receiving the extracted information and 
parallel loops. At this instance, each statement in the parallel loop instruments the race 
detection function after creating a full race covering loop. To be more specific, the 
transformed parallel loop instruments the necessary race detection functions (S230) in 
15 order to allow the monitoring of races through real executions. However, this step is 
commonly used for the standard parallel loop transformation method for race detection 
during an execution of parallel programs. The present invention adds a method which 
maximizes the instrumented race detection function on the iteration less than two times 
of the execution paths. As shown so far, if the race detection function on the iteration 
20 less than two times of the execution paths is instrumented, the instrumented race 
detection function executes the transformed parallel loop in order to detect races (S240). 
Here, the S210 step concerning the static analysis step for each parallel loop body will 
be described in detail with reference to FIG. 3 . 

FIG. 3 is a detailed work flow diagram of the S210 step (static analysis step) as 
25 described in FIG. 2. 
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First of all, by inputting each statement included in the parallel loops, the presently 
analyzed statement is determined whether it is an if-statement or not. After the above 
determination, if the presently analyzed statement is an if-statement 5 then a space is 
allocated for the data structure of the condition statement branch determinant string Cstr 
(S213). In this case, the Cstr controls the value of conditional equation of the if- 
statements which determines the execution paths within the parallel loop body in order 
to represent the iterations within the parallel loop iterations which have execution paths 
that are required to be monitored as predictable patterns. More specifically, a bit is being 
added to Cstr for new conditional statements. 

A detailed structure of the Cstr can be referenced from FIG. 5. FIG. 5 is a diagram 
which shows examples of a configuration diagram and execution path control statement 
of the data structure of a condition statement branch determinant string used for the 
work flow of FIG. 3 and FIG. 4. From FIG. 5, the Cstr available in each parallel loop 
has a bit within each conditional statement in order to represent a true or false value of 
the conditional statement. The number of execution paths of the loop body of the 
statements which have undergone the above processes are used for an analysis step by 
feeding them into an execution analyzer (S214). During the S212 step, if the presently 
analyzed statement is not an if-statement, it is inputted to the path analyzer at once. 

Afterwards, the presently analyzed statement is determined whether it is the last 
statement or not, if it is not, then the path is examined through the identical method as 
shown previously. If it is a indeed the last statement, then the Cstr information and the 
number of execution path are outputted (S216). More specifically, the path analysis is 
continued until the last statement of the parallel loop is received and finally, the 
necessary information for parallel loop transformation is extracted by outputting the 
Cstr information and the number of execution path. 
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Here, the S220 step concerning the creation of a full race covering loop of each 
parallel will be described in detail with reference to FIG. 4. 

FIG. 4 is a detailed work flow diagram of the S220 step (loop creation step) as 
described in FIG. 2. 

5 As shown in FIG. 3, once the static analysis of each loop body is completed, each 

statement of the parallel loop body is inputted (S221) and the inputted statement is 
determined whether or not it is the first statement. If the input statement is indeed the 
first statement, then an execution path control statement inserted prior to the input 
statement. 

10 The execution path control statement dynamically allocates an appropriate Cstr 

value on each allocation in order for each iteration to have an intended execution path 
consequently minimizing the duplicate race detection monitoring for the parallel loops. 
The basic form of an execution path control statement is as shown in FIG. 5 

Using the present loop control variable value of each iteration, the Cstr value for 

1 5 determining the loop body execution path of the present iteration is obtained. 

This process is only required when the present statement is the first statement. After 
going through the S223 step, the present statement is determined whether there exists an 
if conditional statement (S224). If the presently inputted statement is not the first 
statement in S222 step, the step progresses straight to S224 without going through the 

20 223 step. If the present statement is an if-statement in the step 224, then the conditional 
equation CI of the conditional statement is substituted by a conditional statement for 
(Cstr[c_con_bit].eq.l) A ((-XI) V (CI)). The above substituted conditional statement 
determines the branching of the present conditional statement using the Cstr value 
corresponding to the present conditional equation while maintaining the circular 

25 conditional equation semantic. Then, the presently inputted statement is determined 
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whether or not it is the last statement (S226), if the presently inputted statement is not 
the last statement, then the identical process is continued until the last statement is 
received. Afterwards, the arbitrary parallel loop is transformed into a full race covering 
loop. 

If the present statement is an if-statement in the step 224, then the present statement 
is maintained as it is and the above process is continued until the last statement is 
received in order to transform the arbitrary parallel loop in a full race covering loop. 

Here, the step concerning the instrumentation of race detection function as described 
in FIG. 2 will be described in detail with reference to FIG. 6. 

FIG. 6 is a detailed work flow diagram of the S230 step (race detection function 
instrumentation step) as described in FIG. 2. 

First of all, this process can operate in parallel with the full race covering loop step 
as described in FIG. 4. The process allows to maximize the race detection only for the 
iterations to be monitored in the transformed parallel loop. 

The race detection function instrumentation step as described in FIG. 6, when each 
of the statements of the parallel loop are inputted (S232) the presently inputted 
statement is determined whether it is the beginning and ending statement of the parallel 
loop. 

After the above determination, if the presently inputted statement is the beginning 
and ending statement of the parallel loop, then each of the label creation statements and 
ending statements which function on the iteration less than two times of the front and 
end execution paths are instrumented and inserted. 

The instrumentation for the types of the label creation statements and ending 
statements are beyond the scope of the present invention. The corresponding 
conventional methods can be utilized. 
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If the new input statement in the S232 step is neither the beginning nor the ending of 
a parallel loop, the shared variables of the present statement are inspected to see 
whether any accessing incident is included (S234). After the above inspection, if the 
shared variables of the present statement include accessing incidents, then the statement 
which inspects the inclusion of these accessing incidents is allowed to function on the 
iteration less than two times of the execution paths (S235). Afterwards, the present 
statement is determined whether or not it is the last statement (S236), if the present 
statement is indeed the last statement, the race detection instrumentation process is 
terminated. On the contrary, if the present statement is not the last statement, the race 
detection instrumentation process is executed according to the methods as shown above. 

As shown so far, the present invention provides a race detection method during an 
execution of parallel programs which is one of the debugging methods for parallel loop 
programs. Using the information obtained from a static analysis of parallel loop bodies, 
the monitoring time for race detection is improved by transforming the loop bodies in 
order for only the necessary iterations for race detection can be dynamically 
instrumented during the execution. 

Specifically, in comparison to the conventional monitoring methods which typically 
consumes a long time since they monitor the full iterations for each parallel loop in 
parallel loop programs, by monitoring two times of the execution paths irrespective of 
the parallelism of each parallel loop, the present invention can significantly reduce the 
execution time. As a result, the present invention allows a convenient race detection of 
parallel loop programs therefore improving the effectiveness of race detection. 
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